Kinship identification using age transformation and Siamese network

Facial images are used for kinship verification. Traditional convolutional neural networks and transfer learning-based approaches are presently used for kinship identification. The transfer-learning approach is useful in many fields. However, it does not perform well in the identification of humans’ kinship because transfer-learning models are trained on a different type of data that is significantly different as compared to human face image data, a technique which may be able for kinship identification by comparing images of parents and their children with transformed age instead of comparing their actual images is required. In this article, a technique for kinship identification using a Siamese neural network and age transformation algorithm is proposed. The results are satisfactory as an overall accuracy of 76.38% has been achieved. Further work can be carried out to improve the accuracy by improving the Life Span Age Transformation (LAT) algorithm for kinship identification using facial images.


INTRODUCTION
The pictorial data generated by businesses, social media, public industry, non-profit sectors, and scientific research have increased tremendously (Jin et al., 2015). This graphical data contains much useful and worthwhile information that could be used for various purposes (Chen, Mao & Liu, 2014;Emani, Cullot & Nicolle, 2015).
In the last few years, researchers became interested in extracting kinship information from pictorial data with human faces, which can be used for different purposes. As face image data provides different unique features of humans and contains a wealth of information that can be used for various purposes (Zhou et al., 2012a). The purpose of extracting genetic relationships between human images is to verify human kinship, which is useful information for medical sciences, psychologists, security agencies and family album organizations. Furthermore, it can be utilized in image annotation, searching for missing children, human trafficking, and solving immigration and border patrol (Zhou et al., 2012a;Lu et al., 2013;Yan et al., 2014).
Face recognition and verification have been an active area of research for the last two decades. It has been studied enthusiastically to make computers capable as more intelligent applications have been developed for human-computer interface (HCI), security, introduced technique is validated by using the state-of-the-art benchmarked dataset namely RFIW, (4) finally, extensive experiments conducted on the dataset using the proposed technique to identify the improved effectiveness. Moreover, the comparative analysis indicates that the proposed technique outperformed the existing methods.

RELATED WORK
Researchers are interested in kinship verification (family or not family) in the computer vision community by applying different face recognition and machine learning techniques. Fang et al. (2010) introduced the problem and used simple features for kinship identification like eyes and skin color and distances between facial parts for kinship verification. Subsequently, Xia et al. (2012) claimed the similarity between parents and their children is quite large and proposed an approach of kinship learning by removing the gap between two facial images of a parent, one image of young age and another image of old age and children's images. Lu et al. (2014) used a metric learning approach for kinship verification and found effective features, which provided the most discriminative results. Levi & Hassner (2015) proposed a classification methodology using age and gender by applying convolutional neural networks and got better results. Dehghan et al. (2014) proposed the genetic identification technique by determining resemblance between parent and offspring via gated autoencoders. They used deep learning techniques to learn the most discriminative features between parents and children to find out their resemblance. That approach deals with resemblance by using the father and mother's facial shapes and extracting a similar face with a combination of facial feathers of the father and mother (Dehghan et al., 2014).  revealed that Euclidian similarity metric is not a powerful way to measure the similarity of facial images, especially when captured in wild conditions. They clarified that the similarity metric can handle the problem better to deal with face variations compared to Euclidian similarity. They used a mid-level feature vector with discriminative metric learning and proposed a prototype-based feature learning approach for kinship verification Yan et al., 2014).  proposed a methodology of video-based kinship verification by using data set of video faces called Kinship Face Videos in the Wild (KFVW). The dataset was built by capturing facial images from videos for kinship verification. This methodology analyzes the human faces in the video by getting training set from video poses and then applying distance metric learning approaches to get a positive semi definite matrix (PSD) for face recognition and kinship identification . Robinson et al. (2017) introduced the first large-scale image database for kinship recognition called Families In the Wild (FIW) and exploits the challenges in kinship recognition. The FIW database consists of thousands of images of faces for kinship recognition.  presented a framework in which knowledge of face recognition from large-scale data-driven transferred and then fine-tuned metric space to get discrimination of kin related people. They also proposed an augmented strategy to balance family members' images and used triplet and ResNet to extract face encoding for kinship identification. In early techniques, kinship verification uses handcrafted descriptors from facial images to perform classification for learning. Fang et al. (2010) used facial features like eye and skin colors and distance of eye-to-nose for kinship verification. Zhou et al. (2012b) proposed an approach based on spatial pyramid features for kinship verification. This approach used Gaborbased facial image gradient orientation features. Liu et al. (2017) applied a transferrable approach of fisher vectors derived from each facial image to extract similarity for kinship verification (Robinson et al., 2018). Kohli, Singh & Vatsa (2012) proposed an approach to achieve kinship similarity using a self-similarity descriptor. They introduced that kinship verification is a two-factor classification problem. They revealed that low-level features could not be used as an underlying source of visual resemblance between people with kinship relations. In Shallow metric-based approaches, metric learning methods are used to learn discriminative features for kinship verification. These approaches learn a Mahalanobis distance using handcrafted features identification and get a better score of similarity between kinship-related pairs with non-kinship-related pairs (Lu et al., 2013). In the Deep learning-based approach, He et al. (2016) and Kohli et al. (2017) motivated kinship identification and verification after getting impressive success by applying deep learning approaches to classify different facial images. Many techniques have been adopted for deep metric learning to get discriminant features for kinship verification. Dehghan et al. (2014) introduced an approach of fusing the features using gated auto-encoders. They extracted optimal features by reflecting parent-offspring resemblance. Wang,  proposed the Kinship Verification on Families in the Wild with Marginalized Designing Metric Learning (DML). That technique used the largest kinship verification using Auto-encoder and Discriminative Low-rank Metric Learning (DLML) algorithm for feature discrimination. After using matric learning, researchers found a better way to find similarity for kinship identification by using a convolutional neural network.  Ghatas & Hemayed (2020) proposed GANKIN: generating kin faces using disentangled GAN and image synthesis approach from parents to children, they also used pertained FaceNet and GAN network. Nguyen, Nguyen & Dao (2020) proposed an approach of recognizing families through images with pertained encoders. They used pre-trained networks FaceNet, Siamese and FGG network to get face image encoding and find kinship between facial images. Keeping in view the efficiency factor of GAN based approaches, we also used GAN based age transformation algorithm and Siamese network to build and train our model. Although some encouraging results have been obtained from proposed methodologies for kinship identification and verification in the last few years, automatic kinship verification is being performed poorly in the real-world applications used in daily life. Due to the non-availability of large-scale datasets, results are not too accurate to handle the kinship identification problems. Existing datasets like Family101, UB KinFace, Cornell KinFace, KinFaceW-I, and KinFaceW-II provide a few examples, but they fail to achieve accurate distributions of genetic or kinship relationships. Moreover, they have a limited pair of images for parents and children; Classifier trained on a limited scale dataset fails while recognizing real-world images.
To handle these issues, we proposed an approach to find the kinship relationship between parents and children. Our methodology uses age transformation and converts images of parents and children to the age of 15-20 because images of this age have maximum facial features, which can be a good source for the discrimination of features between facial images. After the process of age transformation and converting facial images to a young age for both parents and children, these faces get closer to each other in facial look and expression and then it makes it easy to find the similarity between them. With these images, there is much probability of getting parent's faces and images close to each other. Ultimately, it will make it easy for the face encoder to generate close face encoding. As a result, we get a low distance value while finding cosine similarity. Figure 2 shows the effect of age transformation.

Proposed work
This section outlines the proposed methodology for performing the kinship identification. In the proposed method, we presented a model of a deep relational network that uses a preprocessing stage of age transformation of two facial images before comparing them to exploit kinship relationships from facial images. This scheme first transforms facial images by increasing or decreasing the age factor and making two images into the same age stage and then compares them to find and verify kinship. After transforming facial images, we proposed the use of a Siamese network with two convolutional neural networks by sharing parameters between them. Afterward, it extracts different scales of features to find similarities between images by using triplet loss. We also aimed to conduct experiments on a widely used facial kinship dataset, namely RFIW. In this methodology, the proposed model uses age transformation and converts facial images at the same stage of age, between 15 and 19 years. However, we considered this age because, in this age period, a person's face looks strong and can provide clear facial features and better encode facial images. Furthermore, after encoding transformed faces, we applied triplet loss on three faces of parents and images and extracted the kinship relationship between parents and images. In addition, we have employed parent's images as anchor and negative part of the triplet while children's images as a positive part of the triplet. We fixed the father and mother position of being positive or negative to each other while training in the Siamese network. Likewise, we used an age transformation algorithm that provided close pair of facial images of parents and children for processing to exploit kinship identification between them. This age transformation algorithm will provide images for processing to consider for kinship identification. More graphical representation and the working flow of our proposed methodology is depicted in Fig. 3.

Model training
The proposed model uses age transformation and feature encoding of face images with triplet loss to extract facial similarity to identify kinship. The first stage converts facial images to images of persons having approximate ages of 15-19 years. After doing conversion of two input images with ages between 15-19 years, these converted images are processed with the Siamese network to extract feature encoding for further processing. It uses ResNet 50 with two fully connected layers and one dense output layer to extract features. It extracts a feature vector of 128 × 128 for all input facial images and uses triplet loss to discriminate features for kinship identification. It maximizes the distance of the anchor image with a negative image and minimizes the distance with a positive image. The size of input images is 224 × 224 × 3 and the feature vector returned by the Siamese network is 128. During the training process, hard sample selection for positive or negative pairs are not equally important. The pairs with higher loss might have more impact on the model training. The training set can be defined as: Let X a , X p and X n are finite set of images for Father, Children and Mother having 'm' number of images for each set.
where X a is set of anchor images for father images.
where X p is set of positive images which are taken from children's images.
X n is a set of negative images taken from the set of mother images. Then input sample taken from these three sets will be a powerful set of three sets to make a set of triplets let X is the power of X a, X p and X n, set then we get set X as a set of the triplet.
X ¼ fðx a 1 ; x p 1 ; x n 1 Þ; ðx a 2 ; x p 3 ; x n 4 Þ . . . ðx a n ; x p n ; x n n Þg: X is a power set of images having three members as triplet of anchor as x a , positive as x p and negative image as x n respectively where sequence of triplet members are anchor, positive and negative members with images of father, child, and mother respectively. After getting feature extracted from pertained Siamese network, we get a set of features: FðXÞ ¼ f½f ðx a 1 Þ; f ðx p 1 Þ; f ðx n 1 Þ; f½f ðx a 2 Þ; f ðx p 2 Þ; f ðx n 2 Þ . . . f½f ðx a n Þ; f ðx p n Þ; f ðx n n Þg: This sequence of the set is used for extracting similarities of children with father and mother to get kinship relation of Father-Son (F-S), Father-Daughter (F-D), Mother-Son (M-S) and Mother-Daughter. For sibling relationships, we changed some sequence of power set. We took one sibling image as an anchor, one as positive, and one as negative if the third image of the sibling did not exist. For negative position, we took any random image from the set of mother or father. So, for negative position random set of images: Xr = P {Xa | Xn}. Then set of triplets for sibling relationship Brother-Brother (B-B), Sister-Sister (S-S) and Brother-Sister (B-S) is as follows: X s ¼ fðx a 1 ; x p 2 ; x r 1 Þ; ðx a 2 ; x p 3 ; x r 2 Þ . . . ðx a n ; x p m ; x r n Þg where X s is a power set of images having three members as triplet of anchor as x a , positive as x p and negative image as x r respectively.

Loss function
The loss function for the triplet loss on the extracted feature, For three cases 1. While comparing the father's image with the child's image, if D f is distance of child's image with father's image and D m is distance of child's image with mother's image then we define the loss function as: and some margin 'm' as hyperparameter, whereas A, P, N are anchor, positive and negative images, and f(A), f(P) and f(N) are features of father, child, and mother respectively. If father's image is closer to child's image then we increase the distance of child's image with mother's image and decrease the distance of child's image with father's image, so loss function to get similarity between father and child will be: 2. While checking the similarity of children with mothers then, we revert the loss function. To find the similarity of the child image with the mother image, we increase the distance of child image with father image and decrease the distance of child image with mother image then loss function will be: 3. While comparing images of siblings, we use distance measures of images of two siblings, S1, S2. We find the distance between siblings and random images as: where f(S1), f(S2) and f(Nr) features of siblings and a random image, respectively. After calculating distance and using margin 'm' as hyper parameter, we can define the loss function as: sðA; P; NÞ ¼ MaxðD s1 -D s2 þ m; 0Þ: (11) D s1 is the distance between one sibling with other siblings. Similarly, D s2 is the distance of siblings with a random image to find triplet loss and minimize the distance between the first and second siblings.

Network structure
To select information from different scales of features for input to the relational network, we use the pre-trained Siamese network and get a feature map of size R 512×1 . Network contains three dense layers to down sample the features map and get a features vector of size 128 × 1. Each features vector of size 128 × 1 will provide information of the faces as face encoding. This face encoding is then used to find the cosine similarity between face images respectively. After that relational network analyzes these selected features with multi-layer perceptrons which consists of some fully connected layers and relu activation functions. Following are steps of model training: 1. Pictorial data is fetched from the data set and all images are converted to the same stage of age by LATS. After age transformation, an intermediate dataset is prepared for training from original images.
2. Transformed data is fetched into three vectors: father, mother and children, to prepare a triplet for the Siamese network 3. One vector is used as positive, one for negative and one as anchor 4. The triplet is used by the Siamese network to extract face features 5. Define the triplet loss function. It decreases the distance between positive and anchor images and increases the distance between positive and negative images.
6. Setting up for training and evaluation 7. This multi-layer perceptron will extract the relation of features and output feature of size R 128×1 . Then we compare these features of size R 128×1 at the element level to represent the distance between features of faces.
8. Lastly, we use another multi-layer perceptron to find the similarity of faces for kinship identification from the relation of different face images. It also consists of some fully connected layers and relu activation functions.
A flow of model training is represented in Fig. 4. The CNN structure uses Siamese network; its input size is 3 Ã 256 Ã 256 and final output features vector size is 128 Ã 1. This network has three dense layers subsequently with batch normalization and relu activation function to minizise the size of feature vector.
The relational network has three convolutional layers, each layer uses 128 feature vector of 10 images with batch normalization and relu activation function. The input feature size of each layer is R 10×128×128×128 and last dense layer has output feature size of 1 × 128. It applies segmoid function to establish kin relationship between images, detailed relational network with input parameters is depicted in Table 1.
To optimize the network, contrastive loss function is used with below specifications: where L denotes the loss, N represents the number of samples, y i is the ground truth of ith sample, and d i is the distance between the output of the encoder, m is margin parameter. Similar face images are pushed close and dissimilar images pushed away to get maximum similarity between similar images.

DATA SET
We used a dataset of RFIW and took images of 200 families having good resolution images as a constraint of Life Span Age Transformation (LATS) that requires images having good resolution. LATS generates ten age clusters and each age cluster has ten images. We picked one age cluster between 15 and 19, so we used 10 images to train our model. We used images of ages between 15 and 19 years because in this age period person's face looks strong and can provide clear facial features and we can get the better encoding of facial images. For model training, we used images of 200 families; each family has average four members. For each member we used 10 transformed images and our model is trained on approximate 200 × 4 × 10 = 8,000, from this pool of data, we used 30% data for validation.
As we used the LATS model for preprocessing and Siamese Network for training our model, which are CNN based network architecture, therefore we also adopted CNN

RESULTS AND DISCUSSION
The CNN-based deep relational network is utilized for extracting the features from the facial images of the dataset. Table 1 outlines the details of the included parameters for the CNN-based deep relational network. Unlike the previously existing models, it represents that our model explicitly establishes relations between three feature maps rather than making relations within one another. Additionally, it depicts that our model takes ten images of each member and finds the triplet loss on 128 features maps of each ten images for one member. In total used, 30 features map for one comparison to find the similarity between them. The proposed model delivers the optimal performance by utilizing this methodology.
In this section of the study, we have listed the experiments and achieved results by employing the use of the proposed technique of utilizing a deep relational network along with the LAT age transformation algorithm (Or-El et al., 2020). We have used the large dataset of Recognizing family in the wild (RFIW) for the training and validation of our proposed technique. In the first phase, we converted images of datasets RFIW to different life stages for age transformation. After the age transformation of facial images, we converted images at the same stage of ages by adjusting the age factor. In the first stage, we transformed facial images by increasing or decreasing the age factor and making two images into the same stage of age. In the second phase, we trained our algorithm by comparing two images and evaluating metrics and parameter settings to extract kinship relation accuracy.

Age transformation
For age transformation, we employed the Lifespan Age Transformation Synthesis algorithm, proposed by Or-El et al. (2020). Using this algorithm, we prepared our data set images for comparison that converts images at different stages of life. Afterward, we  Table 2 outlines the training and validation accuracies observed in different relationships by utilizing the proposed model. Similarly, Table 3 represents the observed results on the baseline dataset. While comparing accuracy with a model trained on dataset RFIW, the results from Table 3 indicate that our proposed model has delivered better performance than the existing stateof-the-art models by improving the overall accuracy.
Meanwhile, the previous models have failed to deliver improved performance for up to 73.21% accuracy. On the other hand, the proposed model has outperformed existing state-of-the-art models by delivering an accuracy of 76.38%. Furthermore, we plan to improve the model and accuracy in the future by improving the underlying relational network and applying it to transformed images with the same stage of age.
The major contribution of our research is to introduce a robust way of kinship identification by comparing images of parents and their children with transformed ages instead of comparing their actual images. Improved accuracy of methodology proved that we could get better results for kinship identification if we compare images after age transformation instead of comparing direct actual images. From the results obtained after training indicates that similarity between the same genders is greater than opposite gender because the similarity score between father-son and mother-daughter is greater than father-daughter and mother-son, respectively. The obtained results show that due to the same gender factor, daughter looks more similar to the mother compared to the father. Similarly, the son seems more similar to father rather than the mother.

CONCLUSION
Kinship identification is used for kinship verification by using facial images. Meanwhile, the previous studies have explored this area by employing transfer learning-based solutions. This study, however, presents a different approach to perform kinship verification.
In this study, we have introduced a technique that uses a pre-trained LAT model along with a Siamese network for performing kinship identification. Additionally, we have employed the age transformation approach to find similarities between parents with children. The extensive experimental results were used to validate the performance of our proposed model. Furthermore, the comparative analysis with previously carried out studies reflects that our model outperformed the existing state-of-the-art models using a similar approach, thereby delivering an overall accuracy of 76.38%. In the future, we aim to improve the model performance by improving the underlying relational network and applying it on transformed images with the same age stage.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.